When Big Data Was Small by Richard D. Cramer

When Big Data Was Small by Richard D. Cramer

Author:Richard D. Cramer [Cramer, Richard D.]
Language: eng
Format: epub
Publisher: University of Nebraska Press


Fig. 3. The first published depiction of the comparative molecular field analysis process, from the proceedings of the Istria EuroQSAR conference in 1986. Courtesy of the author.

15

Comparative Molecular Field Analysis

My importance to Tripos resulted from my creation of comparative molecular field analysis, which for the next twenty years ensured Tripos’s leadership among maybe a dozen erstwhile competitors, many perhaps better funded or managed. Today, more than thirty years later, its use is still cited many hundreds of times each year, even though it is no longer commercially available.

Since many scientific methodologies are baptized by publication in a major journal, for this memoir and for my own amusement, I’ll try to provide an objective assessment of CoMFA’s importance by comparing the Google Scholar citation counts for CoMFA’s introductory publication (search term: “Cramer CoMFA”),1 for each of the twenty-eight years through 2015 (when I finished writing the first draft of this memoir), with those for the seminal QSAR paper (search term: “Hansch Fujita”)2 and for the most widely admired method today, docking to a receptor (search term: “Kuntz Dock”).3 Rather to my surprise as well as pleasure, the rate of CoMFA citations had steadily increased until around 2010, to a peak of 350 per year, but with its unavailability this rate is of course declining. Hansch citation rates (1964) are still increasing linearly at a lower rate though over twice as many years, with a 2015 maximum of 241 per year. Docking citations (1982) started off slowly, surpassing 10 per year only in 1990, then growing very rapidly to around 400 per year in 2005, and finally plateauing to 363 per year in 2015. While there’s a lot of apples-to-oranges involved here, comparison of these citation rates does seem to support my pride in having created CoMFA.

By 1985 all the pieces that would be needed to make CoMFA a reality had finally become available to me: the original idea of correlating the biological potencies of candidate 3D drug molecules with the fields they exerted at various locations in space; the Dabyl framework, whose coupling of molecular tables with interactive 3D graphics should simplify the initial creation and manipulation of those 3D molecules; and a PLS implementation by Ildiko Frank that hopefully would surmount the “many times more columns than rows” obstacle of using field intensities as QSAR descriptors. So I drew up a SBIR Phase I application for “Correlation of Dynamic Molecular Shape with Activity.” Again, I didn’t wait for its explicit funding, and long before the end of 1985 I had constructed an assemblage of these pieces, ready for trial.

Validation of a new computational methodology requires at least two inputs, an objective measure of success and an appropriate data set. Ildiko had introduced me to cross-validation, now a universally recognized objective measure of success.4 One by one, each of the molecules in a data set is omitted, and the new methodology is applied to all the other molecules. Using the resulting model, the biological potency of the omitted molecule is predicted.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.